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Abstract: It is now well known that sparse or compressible vectors can be stably recovered 
from their low-dimensional projection, provided the projection matrix satisfies a Restricted 
Isometry Property (RIP). We establish new implications of the RIP with respect to nonlinear 
approximation in a Hilbert space with a redundant frame. The main ingredients of our ap- 
proach are: a) Jackson and Bernstein inequalities, associated to the characterization of certain 
approximation spaces with interpolation spaces; b) a new proof that for overcomplete frames 
which satisfy a Bernstein inequality, these interpolation spaces are nothing but the collection of 
vectors admitting a representation in the dictionary with compressible coefficients; c) the proof 
that the RIP implies Bernstein inequalities. As a result, we obtain that in most overcomplete 
random Gaussian dictionaries with fixed aspect ratio, just as in any orthonormal basis, the er- 
ror of best m-term approximation of a vector decays at a certain rate if, and only if, the vector 
admits a compressible expansion in the dictionary. Yet, for mildly overcomplete dictionaries 
with a one-dimensional kernel, we give examples where the Bernstein inequality holds, but 
the same inequality fails for even the smallest perturbation of the dictionary. 
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Sur la propriete d'isometrie restreinte et rapproximation 
non-lineaire avec un dictionnaire redondant 



Resume : II est maintenant bien etabli que les vecteurs parcimonieux ou compressibles peu- 
vent etre estimes de fa^on stable a partir de leur projection en petite dimension, des que la 
matrice de projection satisfait une Propriete d'Isometrie Restreinte (RIP). Nous etablissons de 
nouvelles consequences de la RIP vis a vis de I'approximation non-lineaire dans un espace 
de Hilbert avec im repere oblique {ou frame) redondant. Les principaux ingredients de notre 
approche sont : a) des inegalites de Jackson et de Bernstein, associees a des caracterisations 
de certains espaces d'approximation en termes d'espaces d'interpolation ; b) la preuve que 
pour des reperes obliques satisfaisant ladite inegalite de Bernstein, les espaces d'interpolation 
en question ne sont rien d'autres que I'ensemble des vecteurs ayant une representation a co- 
efficients compressibles dans le dictionnaire ; c) la preuve que la RIP implique des inegalites 
de Bernstein. Une consequence de ces resultats est que la plupart des dictiormaires Gaussiens 
aleatoires de facteur de redondance fixe se comportent comme une base orthogonale du point 
de vue des espaces d'approximation : I'erreur optimale d'approximation a m termes d'un 
vecteur decroit a une certaine vitesse en fonction de m si, et seulement si, le vecteur admet une 
representation compressible dans le dictionnaire. Toutefois, nous montrons qu'il existe aussi 
des dictionnaires de redondance minimale, dont le noyau est de dimension un, pour lesquels 
I'inegalite de Bernstein, bien que verifiee, pent etre mise en defaut lorsque le dictionnaire subit 
une perturbation arbitrairement petite. 

Mots-cles : Inegalite de Bernstein, dictionnaire aleatoire, condition d'isometrie restreinte 
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1 Introduction 

Data approximation using sparse linear expansions from overcomplete dictionaries has become 
a central theme in signal and image processing with applications ranging from data acquisition 
(compressed sensing) to denoising and compression. Dictionaries can be seen as collections 
of vectors {<Pj} from a Banach space X equipped with a norm || ■ ||x/ and one wishes to 
approximate data vectors / using fc-term expansions YLjel ^jfj where / is an index set of size k. 
Formally, using the matrix notation Oc = X]y Cjfj and denoting l|cl|o = ^{i,Cj 7^ 0} the number 
of nonzero components in the vector c , we can define the (nonlinear) set of all such fc-term 
expansions 

:= {Oc, l|cl|o</c}. 



1.1 Best fc-term approximation 

A first question we may want to answer, for each data vector /, is: how well can we approximate 
it using elements o/Ejt(*I>)? The error of best fc-term approximation is a quantitative answer for 
a fixed k: 

<^kif,^)-= ijif ||/-y||x. 
A more global view is given by the largest approximation rate s > such thaQ 

To measure more finely the rate of approximation, one defines for < q < co \8. Chapter 7, 
Section 9] 



i/<? 



V>0 



and the associated approximation spaces 



(1.1) 



(1.2) 



1.2 Sparse or compressible representations 

Alternatively, we may be interested in sparse / compressible representations of / in the dictionary. 
Suppose the vectors forming O are quasi-normalized in X: for all /, < c < \\ipj\\x^C<oo. 
Then using (quasi)-norms (in particular, < t < 1) one define^ 

:= ,inf Mr (1-3) 

and the associated sparsity spaces (also called smoothness spaces, for when O is, e.g., a wavelet 
frame, they indeed characterize smoothness on the Besov scale) 

:= {MI/IIfw < 00}. 

^ The notation a <b indicates the existence of a finite constant C such that a < C - b. The notation n >; means that 
we have both a <b and b < a. As usual, C will denote a generic finite constant, independent from the other quantities 
of interest. Different occurences of this notation in the paper may correspond to different values of the constant. 

^It has been shown in |11| that under mild assumptions on the dictionary, such as Eq. ilA\ , the definition <1.3t is 
fully equivalent to the more general topological definition of H/H/tj^,) introduced in (8]. 
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1.3 Direct and inverse estimates 

Interestingly, the above defined concepts are related. In a Hilbert space X = Ti, when O 
satisfies the upper bound 

||Oc|||^ < B- ||c||^, Vce£^ (1.4) 
the sparsity spaces for < t < 2 are characterized as 

= {/, 3c, iicii^T < 00,/ = <Dc} = or, 

and for any s > we have the so-called Jackson inequality 

^7•,(/,«I>)<C,(B)■||/||,.(^)■fc-^ s = i-i V/er(0),VfceN (1.5) 

where, as indicated by the notation, the constant Ct(B) only depends on t and the upper 
bound B in lll.4b . Note that the upper bound lll.4b holds true whenever the dictionary is a 
frame: B is then called the upper frame hound, and we will use this terminology. 

When O is an orthogonal basis, a converse result is true: if (J|J,(/, O) decays as k^^ then 
l/ll^5,(<i>) < where £^ is a weak space [8J and s = 1/t — 1/2. More generally, inverse 
estimates are related to a Bernstein inequality [7,6]. 

||Mlr(<i.) <C-k'- m\n, yfk e S,(4>),VL (1.6) 

The inequality 1 11.61 is related to the so-called Bernstein-Nikolsky inequality, we refer the reader 
to (TJ [6J for more information. 

1.4 When approximation spaces are sparsity spaces 

When a Jackson inequality holds together with a Bernstein inequality with matching exponent 
r = 1 /t — 1 /2, it is possible to characterize (with equivalent (quasi)-norms) the approximation 
spaces ^fj(4>) as real interpolation spaces |6, Chapter 7] between H, denoted (^if, £^(0))^^, 
where s = 9r, < 6 < 1. The definition of real interpolation spaces will be recalled in 
Section 121 Let us just mention here that it is based on decay properties of the K-functional 

K{f,t;njPm = ini {\\f - ^c\\n+t\\cy} . (1.7) 

A priori, without a more explicit description of real interpolation spaces, the characterization 
of approximation spaces as interpolation spaces may seem just a sterile pedantic rewriting. 
Fortunately, we show in Sectionl2l(T/ieorem l2.3b that the Bernstein inequality l|1.6ll , together with 
the upper frame bound (|1.4b . allows to directly identify approximation spaces with sparsity 
spaces, with equivalent (quasi)-norms, for certain ranges of parameters. In particular, the 
following result can be obtained as a consequence of Theorem l2.1l 

Theorem 1.1. Suppose that 4> satisfies the upper frame bound (|1.4b with constant B as well as the 
Bernstein inequality il.6i with some < t < 1, with exponent r = 1/t — 1/2 and constant C. Then 
we have 

A^iO) = e^{0) (1.8) 

with equivalent norms, i.e. 

ci{B,C) ■ ||/||,x(<i,) < ||/|U.(<i,) := 11/11^ + |/U.(<i,) < C2(B,C) • 
where the constants only depend on B and C. 
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In other words, under the assumptions of Theorem ll.li a data vector f E H can be approx- 
imated by fc-term expansions at rate k^'' (in the sense / E A'^{f^), where r = 1/t — 1 /2) if, and 
only if, f admits a sparse representation f = J^j Cjcpj with J^j \cj\'^ < oo. 

1.5 Ideal vs practical approximation algorithms 

Consider a function / that can be approximated at rate k^'' using A:- term expansions from O: 
ai^{f,<P) < k^'',\fk > 1. Under the assumptions of the above Theorem, we can conclude that 
the function / indeed admits a representation f = YljCjCpj with YLj \ cjY < Suppose that we 
know how to compute such a representation (e.g., that we can solve the optimization problem 
min ||c||t- subject to / = Oc). Then, sorting the coefficients in decreasing order of magnitude 
\cj„} > k;„,^J, one can build a simple sequence of fc-term approximants fm := Em=i "^/m 'P/m 
which converge to / at the rate r: \\f — < fc^' . Note that one may not however be able to 
guarantee that ||/ — /jtH-H ^ C£7j.(/, O) for a fixed constant C < oo. 
A special case of interest is t = 1, where the optimization problem 

min II c 111 subject to / = Oc 

is convex, and the unit ball in is simply the convex hull of the symmetrized dictionary 

{±q>j}j with (pj the atoms of the dictionary O. Therefore, under the assumptions of the above 
Theorem for t = 1, if a function can be approximated at rate fc"^''^ then, after proper rescaling, 
it belongs to the convex hull of the symmetrized dictionary, and there exists constructive 
algorithms such as Orthonormal Matching Pursuit IIT3l [14l which are guaranteed to provide 
the rate of approximation k~^^^ |8. Theorem 3.7]. 

1.6 Null Space Properties and fragility of Bernstein inequalities 

On the one hand, it is known fTTj that Jackson inequalities are always satisfied provided that 
the dictionary is a frame, i.e., 

^11/11'^ < Il«&^/ll2 < Bll/lll,, yfEH. (1.9) 

The upper bound B is actually equivalent to the upper frame bound lll.4b and therefore suffi- 
cient for a Jackson inequality to hold. 

On the other hand, Bernstein inequalities are known to be much more subtle and seemingly 
fragile: they may be satisfied for certain structured dictionaries, but not for arbitrarily small 
perturbations thereof [10]. 

In Section |3l for the sake of simplicity we restrict our attention to the case t = 1 when 
the dictionary O forms a frame for a general Hilbert space H. We show that the Bernstein 
inequality for ^^(O), 

||"l»c||fi(4,) < Cm^/2||Oc||?^, c : ||c||o < m,Vm > 1, (1.10) 

is closely linked to properties of the kernel of O given by 

N(0) := {zEf:^z = 0}. 

The seemingly simple case where we have a one dimensional null space for the dictionary, 
N(<1>) = span{z} for some fixed sequence z, is particularly useful to demonstrate the fragility 
of the Bernstein estimates as the following example shows. 
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Example 1.2. Given any infinite dictionary 4> with N(<1>) = span{z}, where z = G , 

for some < p < 1. Then for each £ > 0, there is a vector z with l|z — zjlp < e such that the 
Bernstein rnequahty HI. 10b fails for any dictionary O with N(<I>) = span{z}. 

A specific case is given by O = BU {g}, with B the Dirac basis for and g & if for 
some < p < 1 . Then we can find an arbitrarily small perturbation g of g in if such that the 
Bernstein inequality fails for the "perturbed" dictionary O = S U {g} . 

Notice that in the preceeding example, nothing was asssumed about the Bernstein inequal- 
ity for the dictionary <I> itself. Thus, arbitrarily close to any dictionary with a reasonable one 
dimensional null space, there is a "bad" dictionary 

However, it is possible to find good dictionaries with a one dimensional null space for 
which l|1.61 holds. The following example of such a dictionary. 

Example 1.3. Suppose O satisfies N(0) = spanjz}, where z = {zj)JLi is such that there is a 
constant C < oo satisfying 

oo 

Vfc e N : ^ \zj\ < C\zi,\. 
j=k 

Then the Bernstein inequality l|1.10l l holds true. 

An explicit implementation of this example is given by O = S U {g}, with B = {e^ji^^f^ an 
orthonormal basis for £^ and g = — i^^k for some fixed < fl < 1. 

Examples 11.21 and 11.31 combined show that one can always perturb a nice dictionary O for 
which 111. 6b holds ever so slightly as to make (11.6b collapse. 

We justify the two examples in Section |3] by performing a careful analysis of the Bernstein 
inequality l|1.10b when O is a frame. In Section |3A1 we study the general frame dictionary and 
derive a sufficient condition stated in Proposition |3]T] for ill. 10b to hold. Then in Section IT2l we 
present a more refined analysis (Proposition 13.2b in the special case where the kernel N(0) is 
one-dimensional. The proof of Proposition 13 .21 is based on an application of the general results 
in Section |3A] 

1.7 Incoherence and the Restricted Isometry Property 

The above examples illustrate that the Bernstein inequality (and its nice consequences such as 
Theorem II. lb can be fairly fragile. However, this could be misleading, and we will now show 
that in a certain sense "most" dictionaries satisfy the inequality in a robust manner. 

In a previous work we showed that incoherent frames IT2l satisfy a "robust" Bernstein 
inequality, although with an exponent r = 2(1/t — 1/2) instead of the exponent s = 1/t— 1/2 
that would match the Jackson inequality. This inequality is then robust, because small enough 
perturbations of incoherent dictionaries remain incoherent. 

In the last decade, a very intense activity related to Compressed Sensing |9| has lead to the 
emergence of the concept of Restricted Isometry Property (RIP) L3, which generalizes the 
notion of coherence. A dictionary O is said to satisfy the RIP of order k with constant S if, for 
any coefficient sequence c satisfying 1 1 c 1 1 q < k, we have 

{l-S)-\\c\\l<\\Oc\\l^ < {l + S)-\\c\\l (1.11) 

The RIP has been widely studied for random dictionaries, and used to relate the minimum £^ 
norm solution c* of an inverse linear problem / = Oc to a "ground truth" solution cq which is 
assumed to be ideally sparse (or approximately sparse). 



RR n° 7548 



The restricted isometry property meets nonlinear approximation with redundant frames 



7 



In this paper, we are a priori not interested in "recovering" a coefficient vector Cg from the 
observation / = <I>Co. Instead, we wish to understand how the rate of ideal (but NP-hard) 
fc-term approximation of / using O is related to the existence of a representation with small £^ 
norm. 

In Section m we study finite-dimensional dictionaries, where it turns out that the lower 
bound in the RIP Jl.lll l provides an appropriate tool to obtain Bernstein inequalities with 
controlled constano- Namely, we say that the dictionary O satisfies LRIP(fc, (5) with a constant 
^ < 1 provided that 

llOcllf^ > (l-^)-j|c||2, (1.12) 

for any sequence c satisfying ||c||o < k. We prove {Lemma WA\ that inT-L = IR^ the lower frame 
boimd A > and the LRIP(kN,(5), imply a Bernstein inequality for < t < 2 with exponent 
r = 1/t — 1/2. As a result we have: 

Theorem 1.4. Let ^beanmx N frame with frame bounds < A < B < oo. Assume that O satisfies 
LRIP(kN,^5), where S < 1 andO < k < 1. Then 

• for < T < 2, the Bernstein inequality (|1.6|l holds with exponent r = l/r — 1/2 and a constant 
Cr{A,S,K) < 00 (dEq. lICT ) 

• for < T < 1, < < 1, we have, with equivalent norms, 

^:^(4.) = r(o) = (^,£P(<&)V„ i = ^ + i^. 

The constant Ct{A,5,k) and the constants in norm equivalences may depend on A,B,S, and k, but 
they do not depend on the dimension N. 

For random Gaussian dictionaries, the typical order of magnitude of A,S{k) is known 
and governed by the aspect ratio R := N/m of the dictionary, provided that it is sufficiently 
high dimensional (its number of rows should be above a threshold m{R) implicitly defined in 
Section 13). We obtain the following theorem. 

Theorem 1.5. Let ^ be an m x N matrix with i.i.d. Gaussian entries A/'(0, 1/m). Let R := 
N/m be the redundancy of the dictionary. If m > m{R) then, except with probability at most 
lOR^ ■ exp{—j{R)m), we have for allO < r < 1 the equality 

A'^{^) = e{^) = {'H,^^{^))e,T, r = \/T-\/2 = Q{\/p-\/2). (1.13) 
with equivalent norms. 

The constants driving the equivalence of the norms are universal: they only depend on t and the 
redundancy factor R but not on the individual dimensions m and N. Similarly 7(R) and m{R) only 
depend on R. 

Tor R > 1.27 we have j{R) > 7 ■ 10"*'. Tor large R we have ^{R) k. 0.002. 

Indeed, for random Gaussian dictionaries in high-dimension, with high probability, the 
Bernstein inequality holds for all < t < 2 with constants driven by the aspect ratio R := 
N/m but otherwise independent of the dimension N.Using the notion of decomposable dictionary 
irT2l Theorem 3.3], this finite dimensional result can be easily adapted to build arbitrarily 
overcomplete dictionaries in infinite dimension that satisfy the equality 111. St . 

^The control of constants is the crucial part, since in finite dimension all norms are equivalent, which implies that 
the Bernstein inequality is always trivially satisfied. 
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The result of Theorem 11.51 should be compared to our earlier result for incoherent frames 
obtained in [12J. In [12] we foimd an incoherent dictionary with aspect ratio (approximately) 
2 for which the Bernstein inequality lll.6l l can be shown to hold only for the exponent r = 
2(1 /t — 1/2), i.e., for r twice as large as the Jackson exponent s = 1/t — 1/2. Theorem 11.51 
illustrates that the result in [12J really corresponds to a "worst case" behaviour and there are 
indeed many dictionaries (according to the Gaussian measure: the overwhelming majority of 
dictionaries) with a much better behaviour with respect to Bernstein estimates. This holds true 
even for aspect ratios R that can be arbitrarily large. 

1.8 Conclusion and discussion 

The restricted isometry property is a concept that has been essentially motivated by the un- 
derstanding of sparse regularization for linear inverse problems such as compressed sensing. 
Beyond this traditional use of the concept, we have shown new connections between the RIP 
and nonlinear approximation. 

The main result we obtained is that, from the point of view of nonlinear approximation, a 
frame which satisfies a nontrivial restricted property Jj. < 1 (i.e., in the regime k (x N) behaves 
like an orthogonal basis: the optimal rate of m-term approximation can be achieved with an 
approach that does not involve solving a (potentially) NP-hard problem to compute the best m- 
term approximation for each m. In such nice dictionaries, near optimal fc-term approximation 
can be achieved in two steps, like in an orthonormal basis: 

• decompose the data vector / = Xly CjCpj, with coefficients as sparse as possible in the sense 
of minimum £^ norm; 

• keep the m largest coefficients to build the approximant fm '■= E/e/,,, 

The second main result is that redundant dictionaries with the above property are not the 
exception, but rather the rule. While it is possible to build nasty overcomplete dictionaries 
either directly or by arbitrarily small perturbations of some "nice" dictionaries", in a certain 
sense the vast majority of overcomplete dictionaries are nice. 

One should note that several results of this paper are expressed in finite dimension, where 
all norms are equivalent. The strength of the results is therefore not the mere existence of 
inequalities between norms, but in the fact that the involved constants do not depend on 
the dimension. From a numerical perspective, the control of these constants has essentially 
an impact in (very) large dimension, and it is not clear whether the constants numerically 
computed for random dictionaries are useful for dimensions less than a few millions. 

A few key questions remains open. For a given data vector /, it is generally not known in 
advance to which £^(0) space / belongs: under which conditions is it possible to efficiently 
compute a sparse decomposition / — E/ CjCpj which is guaranteed to be near optimal in the 
sense that ||c||t is almost minimum whenever / £ ^^(O)? Can £^ minimization (which is 
convex) be used and provide near best performance under certain conditions ? This is left to 
future work. 

2 Interpolation spaces 

We recall the definition of the K-functional. Let Y be a (quasi-)normed space continuously 
embedded in a Hilbert space H. For f & H, the X-functional is defined by 
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K{f, t) = t-n, Y) := inf {\\f-g\\n + tMr} 

geY 

and the norm defining the interpolation spaces {'H,Y)Qq, < < 1, < < oo, is given by: 

m\n,Y),,--- L r'K[f,tw->,Y.^i<^m{f,2-ir. 

The interpolation space {H, Y)e,q is simply the set of / for which the norm is finite. In our case 
we consider a frame dictionary O and Y = ^P(O), which is continuously embedded in T-L for 
< p < 2. We have the following result. 

Theorem 2.1. Suppose is a frame dictionary for a Hilbert space T-L. Let < t < 1 and suppose the 
Bernstein inequality for £'^{^) holds with exponent r: 

||//c||r(<i.)<C-fc'--||M|^, VAeE,(0),Vfc. 

Define /5 := r/ (1/t — 1/2). Then, for all < 6 < 1, we have the embedding 

in. I'm,, ^n^). l = + p„ 

Moreover, we have 

and if in addition r = 1/t — 1/1 (i.e., (6 = 1), then 

with equivalent norms. The constants in the norm inequalities depend only on p, on the Bernstein 
constant for ^^(O), and on the upper frame bound for O. 

Proof. We use the general technique proposed by DeVore and Popov [7J, and adapt it to the 
special structure of the considered function spaces. One can check that with Y = we 
have 

X(/,f)=inf{||/-<I>c||^ + f||c||p}. 

For each we consider Cj an (almost) minimizer of the right hand side above for t = Fix 
< < 1 and define s := r/6 and p <2 such that s = 1/p - 1/2, and set mj = [2//'J x 2^'/^ 
Define Cj to match Cj on its mj largest coordinates, and be zero anywhere else. Finally, define 
/o := 0, fj := <^Cj, / e N . We can observe that 

\\f-fj\\n < ||/-Oc^-||7^ + ||<i>(c^--c~-)||^ 

(«) 

^ Wf-^CjW-H + Wcj-CjWi 

< 11/ - ^CjWn + \\cj\\p ■ mj' < 11/ - ^CjWn + \\cj\\p ■ 2-i 

< Kif,2-i), 

where in (a) we used the upper frame bound B of O. Accordingly we get 

- fjWn < 11/ - fjWn + 11/ - fj+i \\n<C- K{f,2-i) 
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where the constant only depends on p and the upper frame bound B of O. Since t < 1, we 
have the quasi-triangle inequality 



||W + Z'||^T(^) < |l"|IIr(*) 

Since / = \m\j^^fj = YlJLoifj+l - fj) we obtain 



< 



{b) 
< 



E \{^'^y\\fi+i 

< J2\2'<'K{f,2-i) 



■fi\\n_ 

- \\f\\ln,epmgy 



In (b) we used the fact that fj+i — fj E I.m(^) with m = mj + mj^i < 1)1^ , and the assump- 
tion that the Bernstein inequality with exponent r holds for ^^(4>). To summarize we obtain 
{'H,£f{0))e,T C ^^(4>), together with the norm inequality 

< C ■ \\f\\(n,ePW)e,r 

where the constant only depends on the Bernstein constant for on p, and on the upper 

frame bound B for We have 1/t - 1/2 = r/,6 = (r/^s)s = {0/^){l/p - 1/2), i.e., 1/t = 
(0//3)/p+(l-0//3)/2. 

Similarly, we can define /o = and fj a (near)best m^-term approximation to / with mj = 

2-'"^, 7 > 1 and obtain — fj\\n < 2cr2;-i (/, O),; > 1. Using the Bernstein inequality and 
derivations essentially identical to the previous lines we get 

ll/llrw ^\\M\n + L [2^-1)^2^ (A <!>)]' < \\f\\\w 
j>i 

The constant only depends on the Bernstein constant for £^(0). 

Using [11 , Theorem 6], the upper frame boimd implies the continuous embedding £^ (O) ^ 
^^(O) with s = 1/t — 1/2. Hence, when the Berstein exponent isr = l/T — l/2 = s we have 
equality that is to say ^'^(O) = with equivalent norms. □ 

Remark 2.2. A consequence of Theorem 12.11 is a partial answer to an open question raised 
in IIT2II , where "blockwise incoherent dictionaries" are considered and a Bernstein inequality 
is proved, with exponent r — fyiXlx — 1/2.), j6 = 2, for all < t < 2, yielding the two-sided 
embedding 112. Theorem 3.2]: 

A'^{^)^[Ti,e{^))y2,c,, 0<T<2, T<q<<S,, s = 1/t -1/2. 

By Theorem 12.11 for Q < q < 1, the Bernstein inequality with exponent r = fi{l/cj — 1/2) 
further implies the embedding i'H,i''i^))i/2,q ^ i^^) where 1/q = 1/(4t)+3/8, i.e., 
q = 8t/ (3t + 2). As a result we have 

^ A\{^) ^ 0<T<2/5, = 8t/(3t + 2), s = 1/t- 1/2. 

We know from [12] an example of blockwise incoherent dictionary where the exponent of the 
Berstein inequality cannot be improved, hence the above embedding is also sharp for this class 
of dictionaries. 



RR n° 7548 



The restricted isometry property meets nonlinear approximation with redundant frames 



11 



3 Bernstein estimates for frame dictionaries 

In this section we are interested in the Bernstein inequality 111. 10b in the general case where 
the dictionary O forms a frame for a Hilbert space T-L. The dimension of T-L may be finite or 
infinite. We will show that the Bernstein inequality is closely linked to properties of the kernel 
of O given by 

N = N(<I>) := {zef-:^z = 0}. 

In fact, the frame property ensures that ||3>c||-^ x infzgjv ||c + z||2 for any sequence c G I?-. 
Hence, the Bernstein inequality (|1.10t holds if and only if the quantity 

C(0) := sup sup sup inf m^^^^ ■ ^[['^ (3.1) 

m6Nc:||c||o<mzeN'^'^N IK + ^IIz 

is finite. 

We split our analysis in two parts. In Section |3A] we derive an upper bound on C(0) that 
results in a sufficient condition for (|1.10l l to hold for a general frame dictionary (Proposition 
13.11 1. In Section l3l2l we specialize to the case where the kernel N(0) is one-dimensional. 

The analysis in Section is used to justify Examples ll.2l and 11.31 



3.1 Bernstein constant for general dictionaries 

Here we derive an upper estimate of the quantity C(0), given by l l3.ll , for general frame 
dictionaries in a Hilbert space. This estimate leads to the following sufficient condition for a 
Bernstein inequality for such dictionaries. 

Proposition 3.1. Suppose the dictionary O forms a frame for the Hilbert space T-L, and O has kernel 
N := N(0). Then the Bernstein inequality dl.lOl l holds provided that 

sup sup sup m^^^^ ■ 11^^ [["^ < 00. (3.2) 

z£N m£N I:\I\<m Ik/''I|2 

Moreover, in the case where N Cii^ = {0}, the Bernstein inequality 111. 10b holds if and only if 

Ci(0) := sup sup s < 00, (3.3) 

where is the vector containing the m largest entries in z. 

Proof. We prove the sufficient condition for | |1.10|| by deriving an upper bound for C(4>) given 
by (|3.1b . For any m e N, 

• ( ¥ + ■ ( \\c + v\\i 

sup sup mt — TT^r- — = sup sup sup mt — t- 



J<A.y ±±L± ^ ,^ CPLAp OVAL/ ±±L± ^ ,~ 

c:||<:||o<mz6N-''eNml/21|c + z||2 7:|J|<,„ ,:supp{c)c; zGN ""eN ml/2||c + zjjj 

'\C + Vi\\i + \\vic\\i 



sup sup sup 



inf 



7:|7|<mc:supp(c)c:zeN^e^ Wjl/^y' || c + Z; ||^ + ||zjc||2 
||c + Z/||l + ||Z7r||i 

sup sup sup — :. 

7:|7|<mc:supp(c)c:zGN m^/'^ J \\c + Zi\\l + \\zic\\\ 
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For a given support 1 and z E N, we introduce := ||z/c ||i / ||2;tH2. Notice that for \I\ < m 
and c with supp(c) C I, we have 

||c + zj||i + ||z^-||i < m^/^\\c + zj\\2 + 7j\\zjc\\2 

< max{ml/^7f}(||c + zj||2+ llzKlk) 



< V2max{wii/^7f}.y||c + zj||2 + ||z7.||2. (3.5) 
We let 7f„ := supj.|j|<,„ 7|. Hence, from (13.41 1 we deduce that 

C(0) < V2max (l,sup sup -%r 

which shows that condition | |3.2|I implies C(0) < oo. 

Let us now consider the case N n £^ = {0}. Notice that the infimum over v E N in l|3.1l l is 
attained for v = 0. Hence, C(4>) = sup^^j^^ Bz, with 

ll^lli 

Bz ■= sup sup sup =. (3.6) 



')eIN7:|J|<mc:supp(c)c: m'^ ^J \\c + Zi\\l + ||z;c||| 

For a fixed support J, standard estimates show that 

Iklli 

ml/y||c + Zj||2+ ||zj.||2 

is maximal for choices of the type c = — [zj + Asign(z)l7], A > 0. This choice of c leads to the 
corresponding (squared) optimization problem 

2 1 / tUJ|2 



(Am+ ||zj||i)^ _ 1 / I|zj||f 



A6ROT(A2m2+ ||Z;.||2) m\\\zic\\l 

Notice that 

sup -n — iT = — r~r' 

so we deduce that 

Ci(«I>) < C(«I>) < Ci(0) + 1, (3.7) 
which completes the proof. □ 

3.2 Dictionaries with one dimensional null-spaces 

We now turn to the simplified case where the dictionary O has a one-dimensional null-space. 
In this case, we derive necessary conditions for the Bernstein inequality dl.lOII to hold that is 
valid even when N(4>) c i^, a case not covered by the necessary condition of Proposition 13.11 
We prove the following: 

Proposition 3.2. Suppose the dictionary O is a frame for the Hilbert space TL and has a one-dimensional 
null-space, N(0) = span{z}. Also suppose the Bernstein inequality (|1.101 holds. Then 

C2(z) := sup sup min ( , jl^l}^ ) < oo. (3.8) 

m6N7:U|<m Km^'^^Ah m^'^Zic\\2j 
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Moreover, ifz e for some < p < 1, and the Bernstein inequality Ill.lOb holds for O, then 

sup ' . < 00. 

Proof of Proposition 13.21 In this setting, the Bernstein inequality jl.lOi holds if and only if the 
quantity 



C(<1>) := sup sup sup inf 



|c + Az||i 



Ljl-ll-V UJI^LV LJI^LV -i-i-l-l ^ 

me]Nc:||c||o<mf(6R'^eRWll/2||c + fiz||2 

is finite. By rescaling, we have 

II-Z+ ic||l 

C(3>) = sup sup sup inf ^ ^ 



'«eNc:||c||o<mf/eIR''\'^ll^'W^'^^!I^C + z||2 

sup sup mf — 

meNc:\\c\\o<m^^^ m^'^C + z\\2 

sup sup sup mf — YpyT] iT 

m6]N7:|7|<mc:supp(c)c:'''^''^ "2 lk + z||2 

■ f ||fcj + + llfcFlll 

sup sup sup mf =. (3.9) 

m6N7:|J|<mc:supp(c)c:''«^lR m^/^J\\c + Zi\\\ + \\zic\\\ 



To get a lower estimate for C(0), we simply chose c = —zj in (13.91 1 to obtain 

C(0) > sup sup inf + 

meN;:|;|<„,'5elR m^/^zic\\2 

llz/lli IIzfIIi 



= sup sup mm , , 

mGN/:|:|<m V WJ^^^ l|z/<^ II 2 ml/2||z;c||2 

^ • f l|Zm||l t^m(z)l A 

> sup mm —rj. — , —rjr — . (3.10) 

meN \m^/^a,n{z)2 m^'^o-m{z)2j 

Then clearly C(0) < oo implies condition Il3.8l l. 

If, in addition, we have z E £f for some < p < 1, then it follows from standard results on 
nonlinear approximation with bases in see [51, that m^^'^crm{z)2 — > as m — )• oo. Thus 

||Zm||l 



ml/Vm(z)2 

and we conclude from O.lOb that 

£7,„(z)i 

□ 

We now turn to a justification of Examples II .21 and 11.31 using Propositions 13.11 and 13.21 
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3.2.1 Examples ITIl and [Ol revisited 

We first verify the claim made in Example 1 1.21 Given any dictionary 4> with N(4>) = span{z}, 
where z = (zj)^^ G ^P , for some < p <1. For any £ > 0, we modify z as follows 

1. Choose mo > 2 such that E~ l^/l'' < '^Z^' 

2. Choose a sequence {ntf jfL^ satisfying mf_^i/m£ — > oo. Notice that the sequence will 
necessarily have super-exponential growth. 

3. Fix /3 > 1/p > 1, and choose {jjjJ^Q such that 7^ := C(m^^x — mf)~P, with the constant 
C defined by the equation X^^g tH'^^+i ~ ^i] = ^/^mo+i '^i'^- 

4. Now define z = (z^-)/lo 



Zj-.-- 



Zj, < < ffiQ 

j e [mi + 1, me+i], I e Nq. 



It is easy to verify (using 1. and 3.) that ||z — z||{J < e. 

Let us consider the index set I = [l,m(], £ > 1. We have. 



Thus, 



\zic\\l - C^LT=i{m,+,-m,y-^^ 
^ [mi+i - mf )^-^l^ 
~ {mi+i-mey-^l^ 
> m^+i - mi;. 



zi4\ - mi ^ 



mi\\zic\\2 ni£ mi 



as £ — > 00. Also, since /3 > 1, 



C 



> 



> 



mi{m^+i-m^Y 2/5 

1 1 — mi 
00, 

mi 



as ^ — > 00. We conclude that C2(z) = 00, with C2(z) given in Proposition 13.21 

To verify the claim made in Example 11.31 suppose O satisfies = span{z}, where 

z = (zy)Jl^ is such that there is a constant C < 00 satisfying 

00 

Vfc e N : ^ \zj\ < C\zi,\. 
j=k 
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Then for any finite index set 1 C N, we let m/ :— min{fc : k ^ I}, and notice that ||2/c||i < 
Lf=m, < C|zmJ, while \\zic\\2 > \zm,\. Hence, 



||2j';||2 |Zmj| 
and 

II^fIIi ^ n ^ 
sup sup -r-pT\\ TT < C < 00, 

m&S^l:\l\<m ^ ' W^fWi 

SO Il3.2b is satisfied and the Bernstein inequality lll.lOI I holds by Proposition |3]T] 

4 Bernstein inequality and the RIP 

For certain incoherent dictionaries studied in |12], the Berstein inequality cannot match the 
Jackson inequality, but it still holds with a sharp exponent r = 2(1/t — 1/2) for any t < 2, 
i.e. the sharp factor that can be used in Theorem 12. II is /5 = 2. This result exploits incoherence 
IIT2I Lemma 2.3] to prove that the lower boimd in the RIP is satisfied for k of the order of 
-\/N. Below we prove that the lower frame bound 111. 9b , together with the lower bound in the 
RIP (|1.12l l with k of the order ofN, implies the Bernstein inequality il.6i with controlled constant 
and exponent matching that of the Jackson inequality lll.SII . This Lemma therefore extends our 
previous result based on incoherence fT2i Theorem 2.1]. 

Lemma 4.1. Let ^ be an m x N dictionary. Suppose O has lower frame bound A > and satisfies 
LRIP(kN,(5), where S < 1 and < k < 1. Then for < t < 2, the Bernstein inequality jl.bi holds 
with exponent r = 1/t — 1/2 and constant 

Cr{A,S,K) :=max{(l-^^)-l/2^A-l/V/2-l/T|^ (41) 

Proof. First, suppose 1 < fc < kN. Take / e Sjt(^), and write f = ^c with ||c||o < k. Then, by 
the LRIP(kN,^5) condition, 

ll/llrW < Ikllr < k"--"^\\cU < (1 - ^)-l/2fcl/-l/2 . ||^,||^, 

For kN < k < N, take / E Sj.(4>). We express / in terms of its canonical frame expansion 
relative to O, 

N 

f=L{f'fi)9r (4-2) 

;=i 

We recall that the dual frame {q}j} has an upper frame bound A~^. Hence, we can use the 
expansion (|4.2| | to deduce that 

jl/T-l/l, 



\\h 

"'"''in- 



< A-l/2jvl/T-l/2. ii^^ii 

< fA-l/V/2-l/Tlfcl/T-l/2. n^^i 



The Bernstein inequality and its constant now follow at once from the two separate estimates. 

□ 

Lemma BIT] proves half of Theorem II. 41 Let us complete the proof now. 
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Proof of Theorem [I~4l As we have seen, the lower frame bound and the LRIP(kN, (5) property 
imply the Bernstein inequality for all < t < 2. Moreover the upper frame boimd implies a 
Jackson inequality. For < p < t < 1 both the Jackson and Bernstein inequalities hold for 
£^(0) with exponent 1/p — 1/2, hence [6, Chapter 7] we have with equivalent norms 

A\{<^) = {'H,lP{^))g,^, r = e{l/p-l/2), Q<e<l. 

The Bernstein inequality also holds for with exponent r = 1/t — 1/2, hence by Theo- 

rem ll.ll ^^(O) = with equivalent norms. □ 

Next we wish to estimate A, B, S, k when <I> is a random Gaussian dictionary. The following 
Lemma summarizes well known facts (see e.g. IHIll). 

Lemma 4.2. Let ^ be an m x N matrix with i.i.d. Gaussian entries J\f{0,l/m). For any e > and 
1 < k < m, it satisfies the LRIP(A:,^) ivith 1 — ^ = (1 — rj)'^, where 



'?^=\/-- i + + i + i°gy) I (4.3) 



exp ( -2£fc- (1+log^) ) . (4.4) 



except with probability at most 

QVT^ f — . (1 _1_ l^rr 

Moreover, except with probability at most exp(— e^wi/2), it has the lower frame bound 

A > {VN/m-l-ef (4.5) 

and, except with probability at most exp(— e^OT/2), it has the upper frame bound 

B<iVN7m + l + £f (4.6) 

Proof. First, for a given index set A of cardinality k < m, we observe that the restricted matrix 
4>y\ is m X k with i.i.d. Gaussian entries Af{0,l/m), hence its smallest singular value exceeds 
1 — yjkjm — t except with probability at most exp{ — mt^/2) |5 Theorem 11.13]. By a union 
bound, the smallest singular values among all submatrices Oa associated to the (^) possible 
index sets A of cardinality k exceeds 1 — \/k/ m — t, except with probability at most p{t) := 
(^) ■ exp(-mfV2). Since for all N,A: we have (^) < {Ne/kf = exp (fc ■ (1 + log f)), it follows 
that 



pit) < exp ( ■ (1 + log y ) - mt^/2 



For e > we set 



(l + £)-^i-(l+logf) 
and obtain that, except with probability at most 

p{e) < exp(^fc-(l + log^)-(l-(l+e)2)^ <exp(^-2£/c-(l + log^) 

we have: for all fc-sparse vector c with ||c||o = k, 



VkM- ( l + (l + e)-;/2- (1 + logy 
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To control the frame bounds we consider the random matrix Y := y ^4>^. Since Y is N x m 

with i.i.d. Gaussian entries A/'(0, 1/N), for any t > all its singular values exceed 1 — V m/N — 
t, except with probability at most exp(— [5, Theorem 11.13]. Setting t = e - y^m/N, since 
||0-^x||2 = -^11 ^^11 2' obtain that has lower frame bound 



> ■ (l - (1 + e) ■ Vmjrtj = VWJm - 1 



except with probability at most exp(— £^m/2). We proceed identically for the upper frame 
boimd, using the fact that for any t > 0, no singular value of Y exceeds 1 + \/m/N + t, except 
with probability at most exp(— [5, Theorem 11.13]. □ 

We now obtain our first main theorem (Theorem II. lb by controlling the constant 5 from 
below when k/m is bounded from above, given the redundancy R = N /m of the dictionary 
O. 

Proof of Theorem \T?1\ In Appendix lAl we exhibit a threshold t{R) G (0, 1) such that if N/m = R 
and t = k/m < t(R) then 



k L _ L ^. , N 



-.(^l+2.y2.^1+log-jj<l/2. 

Consider k := \t{R)m\. By Lemma 1421 the dictionary O satisfies the LR1P(A:,^) with (1 
J)^^^^ = [1 — rj)^^ =2 except with probability at most 

pi =exp f-2/c- fl + logy)) < exp(-2[t(_R)m-l] ■ (1 + logR)) 



< e2(i+logR) .g^p(_2f(K)(i + iogR)m) 



Moreover, setting £ := {^/R — 1)/2, it has lower frame bound ^/~A > \/N /m — 1 — £ > (-\/R — 
l)/2 except with probability at most p2 = exp(— £^ot/2). For m > m{R) := 2/ t[R) we have 

N m N - R - 2R' 

and by Lemma 14.11 we obtain (except with probability at most pi + p2) that the Bernstein 
inequality holds for each < t < 2 with constant 



Ct(R) < max 2,2(/R-1)-i 





1/2-1/t\ 


2R 





(4.7) 



Since we also have the upper frame bound ^/B < ^/R + 1 + £' except with probability at most 
P3 = exp( — (£')^ot/2) we obtain with £' = 1 that the upper frame bound \/R + 2 together 
with the Bernstein inequality with constant Ct{R) jointly hold, except with probability at most 
Pi + P2 + P3 < /5exp(— 7m) where 

^ = e2+2logR_^2 = e2^2^2< (e2+2)R2 < ior2. 
7 > min(2f(R)-(l + logR),(yR-l)V8,l/2) =:7(R). 
As shown in Appendix|Al limR^oo 7{R) ~ 0.002, and 7(R) > 7 ■ 10"'' when R > 1.28. □ 
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For u S (0, 1) we have u log 1/u < e, hence for u S (0, 1) and < p < 1: 

logl/u = (l/p)logl/M'' < {l/v)e/uf = (l/M)P(e/p). 
Therefore, for a > 1, b > e, < t < 1, using u = t/b, we obtain 

t]{t) := Vt- (l+a^log{b/t)^ <V~t- (^l+a^{b/t)P{e/p) 

1 p / p 1 , \ IP 1 , 

< t2'2 . Ui +ae^^bV/p\ < t2'2 -lae^y/bP/p 



where in the last inequality we used the fact that ae'^ yjbv I p > 1 (all the factors exceed one) 

1 

and f P < 1. For p = 1 / log b we have bf /p = e log b hence 



rjit) < 2aey/logb-t^\^ 



The definition of ri{t) can be identified with (|4.31 for £ = 1 with t = k/m, a = 2\/2 and 
b = eN /m = eR> e. Denoting c = Aae = 8\/2e, we have just proved 



1 ( iosR 



ri{t) < (c/2) ■ y/l+logR-t^y-'+^^s'^J , VO<i<l. 



Defining 



f(R) 



(1 + logR) 



'logR 



e (0,1), 



(A.l) 



we have the guarantee t](^t{R)^ < 1/2 as well as the identity 

2f (R) ■ (1 + log R) = 2c-^ ■ ■ (1 + log R) 



1 



The right hand side is an increasing function of R, with limit zero when R — > 1 and limit 2c 
when R 00. When R > Rq := (1 + 4/c)2 we have (VR - 1)^/8 > Ic''^ hence 



7(R) :=min(2i(R)- (l+logR),(/R-l)V8j = 2t(R) ■ (1 + logR). 
Since c = SV^e = 2^^'^e, we have c^ — TJ e^ hence 



2c- 



1 



2"^-^ w 0.0021 > 0.002, and Rq = (1 + —^Y w 1.277 > 1.27. 

V8e 



For R > Ro, 

7(R) > lc-\c^ ■ (1 + logRo)]"i^ w 7.8-10"^ > 7- 10"^ 

and limR^oo7(R) = 2c"2 > 2- 10"^. Finally, when R > Rq we have m(R) = 2/t{R) 
4(1 + logR)/7(R) < 6 ■ 10^ ■ (1 + logR), and in the limit of large R we obtain m{R) x 2c2(l 
logR) < 2000- (1 + log R). 
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